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Abstract. The problem of estimation error in portfolio optimization is discussed, in the limit where the 
portfolio size N and the sample size T go to infinity such that their ratio is fixed. The estimation error 
strongly depends on the ratio N/T and diverges for a critical value of this parameter. This divergence is 
the manifestation of an algorithmic phase transition, it is accompanied by a number of critical phenomena, 
and displays universality. As the structure of a large number of multidimensional regression and modelling 
problems is very similar to portfolio optimization, the scope of the above observations extends far beyond 
finance, and covers a large number of problems in operations research, machine learning, bioinformatics, 
medical science, economics, and technology. 

PACS. 02.50.Tt Inference methods - 05. 40. -a Fluctuation phenomena, random processes, noise, and Brow- 
nian motion - 89.65.Gh Economics; econophysics, financial markets, business and management 
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If we do not have sufficient information we are not able 
to take an optimal decision, nor to build a reliable model. 
The purpose of this paper is to turn this trivial remark 
into a quantitative statement. In contrast to the usual 
set-up of classical statistics where the dimension N of the 
problem is considered fixed (and possibly small) while the 
sample size T is assumed to be very large, we are con- 
sidering a situation which is much more realistic in the 
context of complex systems: we assume that N is large, 
and T is limited, at most commensurate with N, that is 
we consider the 'thermodynamic limit' where N/T is fixed 
while both TV and T go to infinity. It will be seen that the 
estimation error strongly depends on the ratio N/T, so 
strongly indeed, that at a critical value of this ratio the es- 
timation error actually diverges. This divergence has been 
recognized [I] as the manifestation of an algorithmic phase 
transition [2j. As such, it is accompanied by a number of 
critical phenomena. The critical index of the divergence 
of the estimation error is universal: it does not depend on 
the details of the various models considered, such as the 
covariance structure of the market [3], the risk measure 
used Q], [3], [5], or the nature of the underlying stochas- 
tic process [6j. All these features have been illustrated on 
the example of portfolio optimization, but, in fact, there 
is nothing special about the financial aspect of the prob- 
lem: it will arise in any other optimization where the pa- 



rameters of the cost function have to be determined from 
empirical observations and we have a limited amount of 
information available. The obvious relationship between 
quadratic optimization and linear regression allows us to 
extend these conclusions also to the regression problem. 

The paper is organized as follows. In Sec. [2] we re- 
capitulate the most important results obtained for the 
quadratic optimization problem, while in Sec. [3] we en- 
large the scope of our study, display the obvious connec- 
tions between optimization and regression, and cast them 
into a form that allows to formulate them as a problem in 
statistical mechanics. This way, a whole arsenal of pow- 
erful methods (scaling, phase transition concepts such as 
universality, random matrices, replicas) become available 
for the treatment of the ubiquitous problem of estimation 
error. The paper ends on a short summary. 



2 A simple quadratic optimization problem 



Let us consider the simplest portfolio optimization prob- 
lem: 



mmcr = mm > WidijWj, 

{nii} {wi}^-^ 
h3 
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subject to 



(2) 



The solution is the minimal variance portfolio with opti- 
mal weights given by 



if'- 



jk a jk 



(3) 



Several remarks are in order: 



— The use of the variance as a risk measure assumes 
that the underlying stochastic process is Gaussian, or 
at least has a similarly narrow distribution. It is well 
known that financial fluctuations often have fat tailed 
distributions for which the variance is not an adequate 
characteristic of risk. Most of what we are going to 
describe below is valid, mutatis mutandis, also for a 
number of other convex risk measures. A remark con- 
cerning other risk measures will be made at an appro- 
priate point below. 

— Normally one does not seek the global minimal vari- 
ance portfolio, but a portfolio that has minimal vari- 
ance given a certain value of the expected return. The 
constraint on expected return has been dropped here, 
for simplicity. It should be noted, however, that the 
optimization task as spelled out here appears in an 
index tracking context. 

— The constraint on the portfolio weights only stipulates 
that they sum to unity, but we do not assume that 
they are positive. That is we allow unlimited short 
positions. The case of excluded short selling will be 
mentioned later. 

— The problem above will be regarded more as a repre- 
sentative of a wide class of convex optimization prob- 
lems than a realistic problem in finance. 

The covariance matrix in Eq. JT]) has to be determined 
from observations on the market. For a portfolio of size N 
we need 0(N 2 ) elements. Time series of length T for N 
assets contain N x T input data. For the statistical esti- 
mate to be reliable we obviously need NT ^> iV 2 , that is 
T ^> N. Real life banking portfolios are large, they may 
contain several hundred elements, whereas the available 
time series are always limited. The choice of the length and 
frequency of the time series is dictated by considerations of 
stationarity and transaction costs, respectively. In practice 
T is never longer than T = 1000 (corresponding to four 
years worth of daily data), and often much shorter. (E.g. 
the EWMA method advocated by Riskmetrics [7] starts 
to cut off around a three-months time horizon.) Therefore, 
the inequality N/T <C 1 almost never holds in practice, 
and we have to live with the consequences of this informa- 
tion deficit. In particular, our empirical covariance matrix 
and the portfolio weights obtained from it will contain a 
lot of estimation error, and the resulting portfolio will be 
suboptimal. 

This problem is, of course, not new: economists have 
been struggling with this 'curse of dimensions' ever since 



the appearence of rational portfolio choice 8J. Since the 
root of the problem is lack of sufficient information, the 
remedy is to inject external information into the estimate. 
This means imposing some structure on cr.y. This intro- 
duces bias, but the beneficial effect of noise reduction may 
compensate for this. Elton and Gruber [9] give a compre- 
hensive review of the main methods. Our focus in this 
paper is the quantitative characterization of the problem, 
not the analysis of its remedies. 

In the course of the analysis of the above optimiza- 
tion problem we have been using both analytic and nu- 
merical methods. On the analytical side we have applied 
methods borrowed from statistical physics, such as ran- 
dom matrix theory, phase transition theory, replicas, etc. 
On the numerical side we have used simulated data, so as 
to have full control over the underlying stochastic process 
and avoid problems related to non-stationarity and other 
imperfections, unrelated to estimation error. For simplic- 
ity, we have mostly used iid normal variables, but have 
considered other, more realistic underlying processes as 
well. 

For such simple underlying processes the true risk mea- 
sure can be calculated exactly. To construct the empirical 
risk measure 



1 T 



(4) 



we generate long time series of the returns x, and cut out 
segments of length T from them, as if making observations 
on the market. From these ,, observations" we construct 
the empirical risk measure and optimize our portfolio un- 
der it. 

The ratio go of the empirical and the exact risk mea- 
sure is a measure of the estimation error due to noise: 



% 
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(5) 



Here cr>? is the exact covariance matrix, and are the 
portfolio weights optimized under er|? ; erj* is the empir 



ical covariance matrix and tu| are the optimal weights 

corresponding to erj^ . The quantity defined in (JSJ) plays a 
role of central importance in our considerations. 

As defined in ([5]), go depends on the optimal weights 
corresponding to a given sample, so it is a random vari- 
able, fluctuating from sample to sample. The weights of 
the optimal portfolio also fluctuate. The distribution of go 
over the samples is shown in Fig. [TJ Since the weights in 

the numerator of ([5|) are optimal under cry , not crj^ , go 
is always larger or equal to one. Its deviation from unity 
is the measure of estimation error. As seen in Fig. [TJ the 
average of go increases with N/T . 

The average of go can, in fact, be calculated exactly by 
various methods. The first calculation, based on random 
matrix theory, was given in |10j . The result is the simple 
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Fig. 1. Distribution of qo over the samples computed by Monte 
Carlo simulations. The number of variables was N = 200, and 
50000 samples were generated to produce the histograms. 
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Fig. 2. Dependence of the distribution of go on the number 
of variables N with N/T fixed at 1/2. 50000 samples were 
generated to produce the histograms. 



formula: 



90 



V 1 - N I T 



(6) 



Eq. © shows that the average estimation error diverges at 
the critical value of the ratio N/T = 1. The divergence of 
go is the manifestation of an algorithmic phase transition, 
related to the appearence of zero modes in the empirical 
covariance matrix in the limit N/T — > 1. We note that 
while for a fixed value of N/T the width of the distribution 
of go tends to zero with increasing TV (i.e. g is a self- 
averaging quantity, see Fig.[3J), for fixed N and N/T going 
to its critical value, the width diverges even stronger than 
the average. 

For iid variables the true covariance matrix is propor- 
tional to the unit matrix, and the true portfolio weights 



are all the same 1/N, therefore (|5|) works out to be 

= ^E^ (1)2 - ( 7 ) 



The large fluctuations of go are thus related to the large 
fluctuations in the length of the solution vector. The in- 
dividual components of this vector, the portfolio weights, 
show strong fluctuations already for relatively small val- 
ues of N/T, such as 1/5 or 1/3. It is easy to show that 
the standard deviation of the weights diverges at N/T = 1 
with the same index —1/2 as go- This means that the opti- 
mization hardly determines the weights even far from the 
critical point! 

It is evident that the above divergences are related to 
the unrealistic feature of our optimization task: the bud- 
get constraint allows arbitrarily large short positions, as 
long as they are compensated by similarly large long po- 
sitions. Accordingly, when the cost function becomes flat 
in the limit N/T — > 1, the solution vector may display 
arbitrarily large fluctuations. A ban on short selling, that 
is, demanding that all the weights be positive, will elim- 
inate these fluctuations, which explains the observation 
that a ban on short selling acts as a regulator [IT] .We 
might then believe that via the constraint w% > we can 
save the optimization task. This is not so: the critical fluc- 
tuations will indeed be eliminated, but as we go close to 
the value N/T = 1, and even more as we go into the region 
N/T > 1, we find that an increasing number of the weights 
get stuck to the boundaries of the allowed region, that is 
a larger and larger number of weights becomes zero. This 
phenomenon of spontaneous reduction of portfolio size is 
well known |12j . and is easy to understand on the basis 
of our geometric picture: with increasing N/T an ever in- 
creasing number of zero modes appear, the cost function 
becomes flat in more and more directions, and the solu- 
tion would like to run away to infinity along these soft 
directions. It is, however, prevented from this by the posi- 
tivity constraints imposed on the weights, so it gets glued 
to the coordinate planes representing these constraints. 
Moreover, the fluctuations from sample to sample will now 
mean that the solution is randomly jumping about the co- 
ordinate planes [13] , therefore the solution of the problem 
will not reflect any stable, objective structure of the cost 
function. Clearly, any other set of constraints that makes 
the domain of the optimization bounded will have a simi- 
lar effect [13] . 

So far we have been considering the oversimplified case 
of iid variables. What happens if we relax this condition? 
Experimenting with various market models (one-factor, 
market plus sectors, positive and negative covariances, 
etc.) we have come to the conclusion that the main fea- 
tures of the task do not change, in particular the expo- 
nent of go remains the same [JJ. This is a manifestation 
of universality: the critical index is invariant to the mi- 
croscopic details of the cost function. In fact, universality 
goes much farther than this. It has been shown that the 
exponent of go is also largely independent of the nature of 
the underlying process and remains the same for fat tailed 
distributions with a tail index a > 2 [14] . moreover, it re- 
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mains the same even for a special type of nonstationary 
process, the Constant Conditional Correlation GARCH 
process [6]. The phase transition concept can thus incor- 
porate a number of different aspects of the problem (inde- 
pendence of the risk measures and the underlying process, 
reduction of diversification for banned short selling) into 
a single coherent picture. 



3 A wider context 

We have seen that the estimation error of the optimal 
portfolio risk diverges as the number of observations ap- 
proaches the number of stocks in the portfolio. Obviously, 
this estimation instability is not restricted to portfolio op- 
timization, but occurs in a very wide range of applications 
also beyond finance. In fact, whenever a phenomenon is 
influenced by a large number of factors, but we have a 
limited amount of information about this dependence, we 
have to expect that the estimation error will diverge and 
fluctuations over the samples will be huge. To show this, 
we first consider a slightly generalized version of the opti- 
mization problem in the previous Section by introducing a 
set of linear constraints, then point out that linear regres- 
sion can also be cast into this form. So the instability due 
to information deficit is present not only in optimization, 
but also in model building. 

Let us then consider the following class of constrained 
quadratic optimization problems: 



mm 

{ " ; 



N N 

in ^2 OijWiWj + ^2 hiWi 
i,j=l i=l 
N 

}^A k jWj = b k 
k = 1,2, ...,K 



(8) 



The task is to minimize the variance of the devia- 
tion e over the regression coefficients Wi, so we have a 
quadratic optimization problem with = Cov(a;i, Xj) 
— l,...,N), K — (no constraints), and hi = 
Cov(xi,y). 

It is well-known that problems conforming to jS} can 
be transformed into several equivalent forms with fewer 
or more constraints via linear transformations of the vari- 
ables and parameters, and/or eliminating/introducing new 
variables (Lagrange multipliers). Thus, any quadratic op- 
timization problem of the form ([8]) can be rephrased ei- 
ther as a portfolio optimization, or as a linear regression 
problem, which allows results obtained in one field to be 
transferred to the other [15] . On the other hand, we can al- 
ways regard the cost function in ([8|) as a simple quadratic 
Hamiltonian, so the task set forth in ([5]) is nothing but 
finding the ground state energy of that Hamiltonian. In- 
troducing a fictitious temperature we can then turn the 
optimization problem into one in statistical mechanics, an 
idea that provided the key to the method of simulated 
annealing [16] . The partition function of this statistical 
mechanics problem is: 



where we assume that the number K of constraints is 
smaller than N , rank(A) = K, and a is a symmetric, 
positive definite matrix. Important special cases of this 
general framework are the following: 

1. Portfolio optimization: <xy = Cov(xi, xj) — 1, ...,N) 
where X4 is the return on asset i and the variables Wi 
are the portfolio weights. The coefficients hi are all 
zero and the constraints may vary according to the 
kind of optimum we are looking for. Two particular 
cases deserve special mention: 

(a) Global minimum variance portfolio: K = 1 and 
Aij — 1 for all j — 1,2,..., A, which is precisely 
the simple case considered in the previous Section. 

(b) Mean- variance portfolio optimization: K = 2, A\j — 
1 and <22j = fij for all j = 1, 2, ...,N, where is the 
expected return of asset j, the standard textbook 
example of portfolio optimization. 

2. Linear regression: consider the regression equation: 



y = w + Y^ 



w l x i 



(9) 



Z = 



exp 



ijWiWj 



Y[d Wl (10) 



The solution to the original problem can then be recovered 
in the zero temperature {[3 — > 00) limit. 

Just as in the case of finance, in most other disciplines 
the parameters er^, hi and A k j are partially or fully esti- 
mated from empirical samples, hence they fluctuate from 
sample to sample. Therefore, we face a situation again 
where the estimated minimum of the objective function 
is actually suboptimal. The amount of measurement noise 
can be measured by the same go as we used before. 

Let us denote the cost function (Hamiltonian) by TL(wi, a{) 
where we optimize over the variables lUj. The vector oti 
includes all fixed parameters of the problem (aij, hi and 
Akj). In theory, the optimum can be expressed as a func- 
tion of the fixed parameters. In practice, however, the real 
value aj of some or all of these parameters are unknown 
(and so arc the real optimal weights w\ ), all we can have 

is a sample based estimator af^, and the corresponding 

(1)* 

estimated optimum will be w\ . Then the measure of 
estimation error can generally be defined as: 



(11) 



It is easy to see that the quantity defined here is equivalent 
to the go introduced in the first part of this paper. 

Let us interpret this result for the problem of linear re- 
gression. In this case, the cost function is the mean squared 
error of the linear model, that is, the variance of the differ- 
ence between the observed value and the predicted value 
of the dependent variable. The denominator of pip is the 
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theoretical minimum of this variance characterizing the 
linc/plane/hyperplane that best fits the true model (i.e. 
the equation with the true parameter values). On the other 
hand, the quantity in the numerator is the expected value 
of the mean squared error of the estimated model. In other 
words, it characterizes the prediction error of the model 
with the estimated (and hence suboptimal) parameter val- 
ues. This latter quantity is obviously always larger than 
the former one, so go — 1 represents the relative increase 
in the prediction error due to sampling noise. 

In the financial context, the properties of go have been 
extensively investigated PQ, [17], [3]. An analytic calcula- 
tion of the mean of go has been carried out by [10] using 
results borrowed from random matrix theory. On the other 
hand, computing the expectation of go means averaging 
over several samples generated by the same underlying 
process. This is completely analogous to quenched averag- 
ing in spin glasses: one has to compute the expected value 
of the logarithm of the partition function. This computa- 
tion can be performed using the replica trick. Indeed, the 
replica method has been employed successfully to inves- 
tigate the instability of qo for different risk measures [4], 
[5] . We have rederived the average of go for the variance as 
risk measure also for correlated Gaussian underlying pro- 
cesses, and confirmed the numerical finding in [17] that 
asymptotically the behaviour of go is the same as for the 
iid normal case considered in [10] . The calculation of the 
higher moments of go is underway [18j . 



4 Summary 

We have addressed the problem of estimation error in 
the context of portfolio optimization. Statistical meth- 
ods work perfectly when the portfolio size is small and 
the sample large. Banking portfolios are almost never like 
that, the 'thermodynamic limit', where the portfolio size 
and the sample size go to infinity such that there is a 
fixed ratio between them, often represents the actual sit- 
uation better. We found that at a critical value of this 
ratio the sampling error diverges. This is an algorithmic 
phase transition, displaying universal power law-like di- 
vergences. We have seen that this phase transition picture 
is more than a somewhat fancy description of the well 
known fact that the empirical covariance matrix develops 
zero eigenvalues for short time series: the phase transition 
concept encompasses a number of seemingly diverse phe- 
nomena and organizes them into a coherent picture, and 
it also encourages one to take over the extremly powerful 
methods of statistical mechanics to the optimization prob- 
lem. Via the relationship between quadratic optimization 
and linear regression all these results can be extended also 
to the latter. 

Complex systems depend on a large number of param- 
eters, they are intrinsicly high dimensional, by definition. 
To collect sufficient data about them is often prohibitively 
expensive or practically impossible. Therefore one typi- 
cally has to face an information deficit catastrophe sim- 
ilar to the one described above and the estimation error 



and sample to sample fluctuations will be huge. The de- 
scription and modeling of such systems may benefit from 
the link statistical mechanics - optimization - regression 
described in this paper. 
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